Here is an example of possible answers to the practical on fitting the deterministic SEITL model to the Tristan da Cunha outbreak.

Each section below correspond to a section of the practical. Thus, you can have a look at our example for one section and then go back to the practical to answer the following sections.

Although our example refers to the SEITL model, the same commands work for the SEIT4L model (i.e. data(SEIT4L_deter) instead of data(SEITL_deter)).

3 Short run analysis

Here is an example of analysis for our preliminary run:

Although the chain was started at a init.theta with a low posterior density, it quickly finds the region of the parameter space with high posterior density. Note also the constant trace of the log-prior since we have assumed a uniform prior.

Overall, it looks like the chain reached its target distribution after 1000 steps.

As anticipated from the trace, burning the first 1000 iterations maximizes the effective sample size (ESS).

Although we have 4000 samples remaining after burning, the ESS is much smaller. This is due to autocorrelation of the chain.

The autocorrelation between samples drops substantially for a lag of 20 iterations. We can thin the trace to reduce the autocorrelation.

Although the thinned trace has 20 times less fewer than the unthinned trace, it has a similar ESS. This is because the autocorrelation has been reduced.

Let’s compare the posterior estimates of the thinned and unthinned traces.

# The unthinned trace
summary(trace.burn)
## 
## Iterations = 1:4001
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 4001 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                     Mean      SD  Naive SE Time-series SE
## R0               17.2400 3.86076 0.0610363       0.293873
## D_lat             1.2168 0.33682 0.0053249       0.040576
## D_inf            10.5268 2.27555 0.0359751       0.171013
## alpha             0.5394 0.03628 0.0005736       0.003276
## D_imm             8.0693 1.97092 0.0311591       0.195471
## rho               0.6998 0.04898 0.0007744       0.004439
## log.prior       -12.8145 0.00000 0.0000000       0.000000
## log.likelihood -141.2841 1.68954 0.0267106       0.143252
## log.posterior  -154.0985 1.68954 0.0267106       0.143252
## 
## 2. Quantiles for each variable:
## 
##                     2.5%       25%       50%      75%     97.5%
## R0                9.8968   14.4880   16.8702   19.586   24.8133
## D_lat             0.5709    0.9951    1.1892    1.423    1.9408
## D_inf             5.9743    8.8864   10.6053   12.226   14.4630
## alpha             0.4677    0.5150    0.5406    0.565    0.6044
## D_imm             4.4876    6.5216    7.8916    9.520   12.0038
## rho               0.6082    0.6674    0.6995    0.735    0.7919
## log.prior       -12.8145  -12.8145  -12.8145  -12.814  -12.8145
## log.likelihood -145.3254 -142.3281 -141.0061 -139.928 -138.9663
## log.posterior  -158.1398 -155.1425 -153.8206 -152.743 -151.7808

# The thinned trace
summary(trace.burn.thin)
## 
## Iterations = 1:191
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 191 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                     Mean      SD Naive SE Time-series SE
## R0               17.3988 3.85418 0.278878       0.336399
## D_lat             1.2195 0.33718 0.024398       0.041233
## D_inf            10.6031 2.24883 0.162720       0.162720
## alpha             0.5389 0.03511 0.002541       0.003768
## D_imm             8.0310 1.98373 0.143538       0.187058
## rho               0.6987 0.04730 0.003423       0.004818
## log.prior       -12.8145 0.00000 0.000000       0.000000
## log.likelihood -141.2704 1.69755 0.122830       0.144851
## log.posterior  -154.0848 1.69755 0.122830       0.144851
## 
## 2. Quantiles for each variable:
## 
##                     2.5%       25%       50%       75%     97.5%
## R0               10.0298   14.6419   17.1295   19.6005   24.7571
## D_lat             0.5684    1.0116    1.1961    1.4486    1.9094
## D_inf             6.3118    9.1569   10.5184   12.4720   14.2501
## alpha             0.4685    0.5176    0.5402    0.5620    0.5992
## D_imm             4.4508    6.5154    7.8623    9.5462   11.7747
## rho               0.6094    0.6675    0.7011    0.7334    0.7807
## log.prior       -12.8145  -12.8145  -12.8145  -12.8145  -12.8145
## log.likelihood -145.3972 -142.1858 -140.9833 -139.8503 -138.9644
## log.posterior  -158.2117 -155.0003 -153.7978 -152.6648 -151.7789

They are very similar. So why thin? Because autocorrelation produces clumpy samples that are unrepresentative, in the short run, of the true underlying posterior distribution. We can check this by comparing the thinned and unthinned distributions using the function plotPosteriorDensity of the fitR package:

The thinned trace shows a smoother distribution despite having less samples than the unthinned one. This because the local “bumps” of the unthinned distribution are caused by autocorrelated samples.

You can now go back to the practical and perform a similar analysis for a long-run MCMC.

4 Long run analysis

Here is an example of an analysis for our long run (\(10^5\) iterations)

# load mcmc output
data(mcmc_TdC_deter_longRun)

# create mcmc objects for both traces
library("coda")
trace1 <- mcmc(mcmc_SEITL_theta1$trace)
trace2 <- mcmc(mcmc_SEITL_theta2$trace)

# combine traces as mcmc.list object
trace <- mcmc.list(list(trace1, trace2))

# let's have a look
head(trace, 3)
## [[1]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 1 
## End = 4 
## Thinning interval = 1 
##         R0    D_lat    D_inf    alpha    D_imm       rho log.prior
## 1 2.000000 2.000000 2.000000 0.800000 16.00000 0.8500000 -12.81448
## 2 2.000000 2.000000 2.000000 0.800000 16.00000 0.8500000 -12.81448
## 3 3.291068 1.927748 2.139757 0.824364 19.14906 0.8684192 -12.81448
## 4 3.644072 1.868155 1.899170 0.809881 18.47234 0.8769007 -12.81448
##   log.likelihood log.posterior
## 1      -445.7795     -458.5939
## 2      -445.7795     -458.5939
## 3      -442.6925     -455.5070
## 4      -438.2018     -451.0162
## 
## [[2]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 1 
## End = 4 
## Thinning interval = 1 
##         R0   D_lat    D_inf      alpha    D_imm       rho log.prior
## 1 20.00000 2.00000 2.000000 0.10000000 8.000000 0.3000000 -12.81448
## 2 20.00000 2.00000 2.000000 0.10000000 8.000000 0.3000000 -12.81448
## 3 18.71727 2.70723 1.014131 0.07367196 9.405255 0.2468698 -12.81448
## 4 18.71727 2.70723 1.014131 0.07367196 9.405255 0.2468698 -12.81448
##   log.likelihood log.posterior
## 1      -321.5801     -334.3946
## 2      -321.5801     -334.3946
## 3      -320.1123     -332.9268
## 4      -320.1123     -332.9268
## 
## attr(,"class")
## [1] "mcmc.list"

# acceptance rate
1 - rejectionRate(trace)
##             R0          D_lat          D_inf          alpha          D_imm 
##        0.20254        0.20254        0.20254        0.20254        0.20254 
##            rho      log.prior log.likelihood  log.posterior 
##        0.20254        0.00000        0.20254        0.20254
# close to the optimal value of 0.234

# ESS
effectiveSize(trace)
##             R0          D_lat          D_inf          alpha          D_imm 
##       5437.065       6163.158       6093.018       7106.347       6806.269 
##            rho      log.prior log.likelihood  log.posterior 
##       7453.447          0.000       5458.377       5458.377

# plot the traces
library("lattice")  ## for the 'xyplot' command
xyplot(trace)

Note that the acceptance rate and the ESS are computed for the combined chain whereas the traces are plotted for each chain. Also, given the very high ESS we can reasonably choose a burn-in visually, say 5000 iterations.

Again, given the very high ESS, we can be quite generous in our choice of the thinning.

However, let’s compare the thinned and unthinnned distributions.

In contrast to the previous short-run, they are almost no difference between the thinned and unthinned chains. Indeed, with such a long chain, the clumpy autocorrelation has been averaged out!

In fact, there are several references that show that the longer (unthinned) chain usually yields better estimates of the true posterior than the shorter thinned chain, even for percentiles in the tail of the distribution. That said, thinning can be useful for other reasons, such as memory or time constraints in post-chain processing.

Now, we can compare whether the two independent chains, started at theta1 and theta2, have converged to the same posterior distribution

Since the chains have converged to the same posterior, we can use the combined estimates

Running several independent chains starting from different parts of the parameter space allows us to check whether the posterior distribution is multi-modal. If so, then we must be careful when combining the chains. For instance, an estimate of the mean computed with summary won’t be meaningful for a parameter with a multi-modal posterior.

By contrast, for a unimodal posteriors, combining chains is an efficient way to increase the ESS and the precision of the posterior estimates. Furthermore, running several “shorter” chains in parallel is faster than running one “long” chain.

Finally, let’s assess the fit of the deterministic SEITL model.

Note that the 95% credible intervals (CI) for the posterior fit under the MAP captures the highest data point. By contrast, the fit of the second peak seems quite poor, even for the MAP.

You can now go back to the practical and look at the posterior correlations between the parameters.

5 Correlations

The correlation of the posterior distribution can be investigated using levelplot.

Note the strong positive correlations (~0.8) between \(R_0\) and \(D_{lat}\) and between \(R_0\) and \(D_{inf}\). In order to explain the wide 95% CIs of \(R_0\) and \(D_{inf}\), let’s have a look at the contact rate \(\beta = R_0/D_{inf}\).

The posterior value of \(\beta\) varies somewhat less than the posterior values of \(R_0\) and \(D_\mathrm{inf}\). Indeed, this parameter is constrained by the shape of the initial phase of the outbreak. Conversely, there are an infinite number of combinations of \(R_0\) and \(D_{inf}\) that lead to the same \(\beta\), hence their wide 95% CIs.

A second effect that could explain the wide posterior density in \(R_0\) is the very high attack rate. Indeed, once \(R_0>5\) it doesn’t make much difference whether \(R_0\) is equal to, say, 10 or 20.

We can also note that the posterior estimate of \(D_{inf} = 11\) days (95% CI: \([6-15]\)) is biologically unrealistic based on previous empirical estimates. However, our approach did not include any prior information as the default SEITL_deter fitmodel comes with uniform priors for all parameters.

In order to include previous empirical information on \(D_{lat}\) and \(D_{inf}\), let’s modify the dprior function of SEITL_deter as follows:

Note the choice of a truncated normal distribution since \(D_{lat}\) and \(D_{inf}\) must be positive.

You can now go back to the practical and run a MCMC with this informative prior.

6 Informative priors

Here we combine both chains with informative priors and compare the posterior distribution with the one above.

\(R_0\) and \(D_{inf}\) have very different posterior. This is expected since there is an informative prior on \(D_{inf}\) and \(R_0\) is strongly correlated to \(D_{inf}\). Note also that the mode of all other parameters have changed, though less than \(D_{inf}\) and \(R_0\). This illustrate the influence that one prior can have on the full posterior distribution.

You can now go back to the practical.